Using the Triangle Inequality to Accelerate -Means

نویسنده

  • Charles Elkan
چکیده

The -means algorithm is by far the most widely used method for discovering clusters in data. We show how to accelerate it dramatically, while still always computing exactly the same result as the standard algorithm. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality in two different ways, and by keeping track of lower and upper bounds for distances between points and centers. Experiments show that the new algorithm is effective for datasets with up to 1000 dimensions, and becomes more and more effective as the number of clusters increases. For it is many times faster than the best previously known accelerated -means method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using the Triangle Inequality to Accelerate k-Means

The -means algorithm is by far the most widely used method for discovering clusters in data. We show how to accelerate it dramatically, while still always computing exactly the same result as the standard algorithm. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality in two different ways, and by keeping track of lower and upper bounds for dist...

متن کامل

Elkan's k-Means Algorithm for Graphs

This paper proposes a fast k-means algorithm for graphs based on Elkan’s k-means for vectors. To accelerate the k-means algorithm for graphs without trading computational time against solution quality, we avoid unnecessary graph distance calculations by exploiting the triangle inequality of the underlying distance metric. In experiments we show that the accelerated k-means for graphs is faster ...

متن کامل

On the metric triangle inequality

A non-contradictible axiomatic theory is constructed under the local reversibility of the metric triangle inequality. The obtained notion includes the metric spaces as particular cases and the generated metric topology is T$_{1}$-separated and generally, non-Hausdorff.

متن کامل

Elkan's k-Means for Graphs

This paper extends k-means algorithms from the Euclidean domain to the domain of graphs. To recompute the centroids, we apply subgradient methods for solving the optimization-based formulation of the sample mean of graphs. To accelerate the k-means algorithm for graphs without trading computational time against solution quality, we avoid unnecessary graph distance calculations by exploiting the...

متن کامل

Parallel K-Means Clustering with Triangle Inequality

Clustering divides data objects into groups to minimize the variation within each group. This technique is widely used in data mining and other areas of computer science. K-means is a partitional clustering algorithm that produces a fixed number of clusters through an iterative process. The relative simplicity and obvious data parallelism of the K-means algorithm make it an excellent candidate ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003